Learning the Structure of Biomedical Relationships from Unstructured Text
نویسندگان
چکیده
The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining.
منابع مشابه
A Hybrid Approach to Extract and Classify Relation from Biomedical Text
Unstructured biomedical text is a key source of knowledge. Information extraction in biomedical is a complex task due to the high volume of data. Manual efforts produce the best results; however, it is a near impossible task for such a large amount of data. Thus, there is a need of tools and techniques in biomedical text to extract the information automatically. Biomedical text contains relatio...
متن کاملA Framework for Schema-Driven Relationship Discovery from Unstructured Text
We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-based method for (1) extraction of such complex entities and (2) relationships between them and (3) the convers...
متن کاملKnowledge Management for Biomedical Literature: the Function of Text-mining Technologies in Life-science Research
Efficient information retrieval and extraction is a major challenge in life-science research. The Knowledge Management (KM) for biomedical literature aims to establish an environment, utilizing information technologies, to facilitate better acquisition, generation, codification, and transfer of knowledge. Knowledge Discovery in Text (KDT) is one of the goals in KM, so as to find hidden informat...
متن کاملUsing Deep Learning Towards Biomedical Knowledge Discovery
A vast amount of knowledge exists within biomedical literature, publications, clinical notes and online content. Identifying hidden, interesting or previously unknown biomedical knowledge from free text resources using an automated approach remains an important challenge. Towards this problem, we investigate the use of deep learning methods that have shown significant promise in identifying hid...
متن کاملBiomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges
Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However...
متن کامل